Skip to content

Handle API Status errors 499 differently#132

Closed
p- wants to merge 7 commits intomainfrom
p--handle-499
Closed

Handle API Status errors 499 differently#132
p- wants to merge 7 commits intomainfrom
p--handle-499

Conversation

@p-
Copy link
Copy Markdown
Contributor

@p- p- commented Jan 20, 2026

No description provided.

p- added 2 commits January 20, 2026 16:38
Added APIStatusError handling to manage transient errors during API calls.
Copilot AI review requested due to automatic review settings January 20, 2026 15:43
@p- p- requested a review from Kwstubbs as a code owner January 20, 2026 15:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds special handling for HTTP 499 status errors from the OpenAI API, treating them as transient errors that should be retried similar to timeout errors.

Changes:

  • Imports APIStatusError from the openai library to catch API status errors
  • Adds retry logic for 499 status code errors (client closed request/upstream cancelled)
  • Updates error logging message from "Max retries for APITimeoutError reached" to the more generic "Max API retries reached"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +340 to 348
except APIStatusError as e:
# Retry transient “client closed request / upstream cancelled” style errors
if getattr(e, "status_code", None) != 499:
raise # propagate non-499 errors
# 499: retry
if not max_retry:
logging.error(f"Max API retries reached")
raise
max_retry -= 1
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new APIStatusError exception handler should be added to the outer exception handling block (lines 361-381) to provide consistent error handling and user feedback. Currently, if a non-499 APIStatusError is raised and propagated from the inner try-except block, it won't be caught by any outer handler, potentially resulting in an unhandled exception. Consider adding an exception handler for APIStatusError similar to the existing handlers for BadRequestError and APITimeoutError.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea probably worth adding a handler in the outer scope like APITimeoutError

Comment on lines +340 to 348
except APIStatusError as e:
# Retry transient “client closed request / upstream cancelled” style errors
if getattr(e, "status_code", None) != 499:
raise # propagate non-499 errors
# 499: retry
if not max_retry:
logging.error(f"Max API retries reached")
raise
max_retry -= 1
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new APIStatusError retry logic for 499 status codes lacks test coverage. Since the repository includes tests for other functionality (test_api_endpoint_config.py, test_cli_parser.py, test_yaml_parser.py), consider adding tests to verify that 499 errors are properly retried and that non-499 APIStatusErrors are propagated correctly.

Copilot uses AI. Check for mistakes.
m-y-mo
m-y-mo previously approved these changes Jan 20, 2026
@p-
Copy link
Copy Markdown
Contributor Author

p- commented Jan 20, 2026

smoke test

@github-actions
Copy link
Copy Markdown

Deployment Triggered 🚀

p-, started a branch deployment to smoketest (branch: p--handle-499)

You can watch the progress here 🔗

Details
{
  "type": "branch",
  "environment": {
    "name": "smoketest",
    "url": null
  },
  "deployment": {
    "timestamp": "2026-01-20T16:27:06.876Z",
    "logs": "https://github.com/GitHubSecurityLab/seclab-taskflow-agent/actions/runs/21179132080"
  },
  "git": {
    "branch": "p--handle-499",
    "commit": "c38b939e7ecb6d588fae9f5f5e223a346feca4cc",
    "verified": true,
    "committer": "web-flow",
    "html_url": "https://github.com/GitHubSecurityLab/seclab-taskflow-agent/commit/c38b939e7ecb6d588fae9f5f5e223a346feca4cc"
  },
  "context": {
    "actor": "p-",
    "noop": false,
    "fork": false,
    "comment": {
      "created_at": "2026-01-20T16:26:48Z",
      "updated_at": "2026-01-20T16:26:48Z",
      "body": "smoke test",
      "html_url": "https://github.com/GitHubSecurityLab/seclab-taskflow-agent/pull/132#issuecomment-3773810578"
    }
  },
  "parameters": {
    "raw": null,
    "parsed": null
  }
}

@github-actions
Copy link
Copy Markdown

Deployment Results ✅

p- successfully deployed branch p--handle-499 to smoketest

Details
{
  "status": "success",
  "environment": {
    "name": "smoketest",
    "url": null
  },
  "deployment": {
    "id": 3666688678,
    "timestamp": "2026-01-20T16:27:08.716Z",
    "logs": "https://github.com/GitHubSecurityLab/seclab-taskflow-agent/actions/runs/21179132080",
    "duration": 2
  },
  "git": {
    "branch": "p--handle-499",
    "commit": "c38b939e7ecb6d588fae9f5f5e223a346feca4cc",
    "verified": true
  },
  "context": {
    "actor": "p-",
    "noop": false,
    "fork": false
  },
  "reviews": {
    "count": 1,
    "decision": "APPROVED"
  },
  "parameters": {
    "raw": null,
    "parsed": null
  }
}

Copilot AI review requested due to automatic review settings January 20, 2026 16:51
@p-
Copy link
Copy Markdown
Contributor Author

p- commented Jan 20, 2026

smoke test

@github-actions
Copy link
Copy Markdown

⚠️ Cannot proceed with deployment

  • reviewDecision: REVIEW_REQUIRED
  • commitStatus: PENDING

CI checks must be passing and the PR must be approved in order to continue

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +382 to +386
except APIStatusError as e:
await render_model_output(f"** 🤖❗ API Status Error: {e}\n",
async_task=async_task,
task_id=task_id)
logging.error(f"API Status Error: Status={e.status_code}, Response={e.response}")
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APIStatusError exception handler is placed before the more specific exception types (BadRequestError, APITimeoutError, RateLimitError) in the exception hierarchy. Since these specific exceptions inherit from APIStatusError, this catch block will intercept them before they reach their intended handlers above (lines 372-381). The APIStatusError handler should be moved to the end of the exception chain, after all the more specific handlers.

Copilot uses AI. Check for mistakes.
kevinbackhouse
kevinbackhouse previously approved these changes Jan 27, 2026
Copilot AI review requested due to automatic review settings February 9, 2026 17:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/seclab_taskflow_agent/main.py:357

  • In the APITimeoutError retry loop, the max-retries case now logs with logging.error and immediately re-raises, which drops the original stack trace/context compared to logging.exception and also loses the actual exception message (since it’s not captured as as e). Capture the exception and log with traceback (or include the exception details) before raising so failures remain diagnosable.
                    except APITimeoutError:
                        if not max_retry:
                            logging.error(f"Max API retries reached")
                            raise
                        max_retry -= 1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

await render_model_output(f"** 🤖❗ API Status Error: {e}\n",
async_task=async_task,
task_id=task_id)
logging.error(f"API Status Error: Status={e.status_code}, Response={e.response}")
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging.error(f"API Status Error: Status={e.status_code}, Response={e.response}") logs the full response object, which can be very large and may include sensitive payloads. Prefer logging a minimal, stable subset (e.g., status_code + request id / error type/message) to avoid leaking content and to keep logs bounded.

Suggested change
logging.error(f"API Status Error: Status={e.status_code}, Response={e.response}")
logging.error(
"API Status Error: status=%s, message=%s",
getattr(e, "status_code", "unknown"),
getattr(e, "message", str(e)),
)

Copilot uses AI. Check for mistakes.
Comment on lines +358 to 366
except APIStatusError as e:
# Retry transient “client closed request / upstream cancelled” style errors
if getattr(e, "status_code", None) != 499:
raise # propagate non-499 errors
# 499: retry
if not max_retry:
logging.error(f"Max API retries reached")
raise
max_retry -= 1
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the APIStatusError(499) retry path, the max-retries case logs with logging.error and re-raises without preserving traceback/context. Consider logging with traceback (or include exception details) before raising so repeated 499 failures are diagnosable.

This issue also appears on line 353 of the same file.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to 25
from agents import Tool, RunContextWrapper, TContext, Agent
from openai import BadRequestError, APITimeoutError, RateLimitError, APIStatusError
from agents.mcp import MCPServerSse, MCPServerStdio, MCPServerStreamableHttp, create_static_tool_filter
from dotenv import find_dotenv, load_dotenv
from openai import APITimeoutError, BadRequestError, RateLimitError
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate imports were introduced: Agent/RunContextWrapper/TContext/Tool are imported twice (line 15 and 21) and OpenAI exception classes are imported twice (line 22 and 25). Consolidate into a single import per module to avoid lint failures and keep imports maintainable.

Suggested change
from agents import Tool, RunContextWrapper, TContext, Agent
from openai import BadRequestError, APITimeoutError, RateLimitError, APIStatusError
from agents.mcp import MCPServerSse, MCPServerStdio, MCPServerStreamableHttp, create_static_tool_filter
from dotenv import find_dotenv, load_dotenv
from openai import APITimeoutError, BadRequestError, RateLimitError
from openai import BadRequestError, APITimeoutError, RateLimitError, APIStatusError
from agents.mcp import MCPServerSse, MCPServerStdio, MCPServerStreamableHttp, create_static_tool_filter
from dotenv import find_dotenv, load_dotenv

Copilot uses AI. Check for mistakes.
await render_model_output(f"** 🤖❗ Timeout Error: {e}\n",
async_task=async_task,
task_id=task_id)
logging.error(f"Bad Request: {e}")
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APITimeoutError outer handler logs Bad Request: {e} even though this is a timeout path. This is misleading when debugging and in logs/telemetry; update the log message to reflect a timeout (and consider logging.exception if you want traceback).

Suggested change
logging.error(f"Bad Request: {e}")
logging.exception("Timeout Error")

Copilot uses AI. Check for mistakes.
@p-
Copy link
Copy Markdown
Contributor Author

p- commented Feb 12, 2026

shelving this fix. Never had that issue again like this. And the fix is too specific.

@p- p- closed this Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants